Streamline sparse cosine dot-product lookups by Codex · Pull Request #195 · madara88645/Compiler

Codex · 2026-03-18T20:33:23Z

Frequent TF-IDF cosine similarity calls were still doing double dictionary lookups per key in the hot loop despite prior intersection optimization, adding needless hashing overhead.

cosine_similarity: Iterate the smaller TF-IDF dict and use a sentinel-backed get to collapse membership + fetch into one lookup while preserving zero-value correctness.
Bolt note: Document the single-lookup sparse-dict pattern as the recommended approach for hot paths.

Example:

missing = object()
dot = 0.0
for k, v in v1.items():  # v1 is the smaller dict
    v2_val = v2.get(k, missing)
    if v2_val is not missing:
        dot += v * v2_val

vercel · 2026-03-18T20:33:30Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
compiler	Ready	Preview, Comment	Mar 18, 2026 8:37pm

Codex · 2026-03-18T20:36:55Z

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

openaipublic.blob.core.windows.net
- Triggering command: /usr/bin/python python -m pytest (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

chatgpt-codex-connector · 2026-03-18T20:37:52Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

This PR optimizes the TF‑IDF cosine similarity hot loop used by semantic chunking by reducing sparse-vector dot-product lookups from two dictionary operations (in + []) to a single sentinel-backed dict.get, and documents the pattern for future use.

Changes:

Update cosine_similarity to use a sentinel-backed v2.get(k, sentinel) during sparse dot-product accumulation.
Update .jules/bolt.md to recommend the single-lookup sparse-dict intersection pattern for performance-critical paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
app/rag/simple_index.py	Collapses membership + fetch into one dict lookup per key in the sparse dot-product loop.
.jules/bolt.md	Documents the sentinel-backed `dict.get` pattern as the preferred sparse intersection optimization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

app/rag/simple_index.py

+        missing = object()
        dot = 0.0
        for k, v in v1.items():


…ictionary intersection (#193) * Optimize TF-IDF cosine similarity dictionary intersection Avoid allocating expensive sets and performing set intersection when calculating cosine similarity for sparse dictionaries (like TF-IDF scores). Instead, iterate over the smaller dictionary items directly checking for key existence in the larger dictionary. This provides roughly a 30-40% speed-up in this calculation hot loop during chunking. * Optimize TF-IDF cosine similarity dictionary intersection Avoid allocating expensive sets and performing set intersection when calculating cosine similarity for sparse dictionaries (like TF-IDF scores). Instead, iterate over the smaller dictionary items directly checking for key existence in the larger dictionary. This provides roughly a 30-40% speed-up in this calculation hot loop during chunking. * Streamline sparse cosine dot-product lookups (#195) * Initial plan * Optimize sparse cosine similarity lookups --------- Co-authored-by: openai-code-agent[bot] <242516109+Codex@users.noreply.github.com> --------- Co-authored-by: Codex <242516109+Codex@users.noreply.github.com>

Initial plan

f501488

Codex AI assigned Codex and madara88645 Mar 18, 2026

Codex AI mentioned this pull request Mar 18, 2026

⚡ Bolt: [performance improvement] Optimize TF-IDF cosine similarity dictionary intersection #193

Merged

Codex started work on behalf of madara88645 March 18, 2026 20:33 View session

vercel bot deployed to Preview March 18, 2026 20:34 View deployment

Optimize sparse cosine similarity lookups

5de6b52

Codex AI changed the title ~~[WIP] [performance improvement] Optimize TF-IDF cosine similarity~~ Streamline sparse cosine dot-product lookups Mar 18, 2026

Codex AI requested a review from madara88645 March 18, 2026 20:37

Codex finished work on behalf of madara88645 March 18, 2026 20:37

vercel bot deployed to Preview March 18, 2026 20:37 View deployment

madara88645 marked this pull request as ready for review March 18, 2026 20:37

Copilot AI review requested due to automatic review settings March 18, 2026 20:37

madara88645 merged commit 6ff01af into bolt-tfidf-sparse-dict-intersection-5659582442749844468 Mar 18, 2026
4 checks passed

madara88645 deleted the codex/sub-pr-193 branch March 18, 2026 20:38

Copilot AI reviewed Mar 18, 2026

View reviewed changes

app/rag/simple_index.py

Comment on lines +281 to 283

missing = object()

dot = 0.0

for k, v in v1.items():

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline sparse cosine dot-product lookups#195

Streamline sparse cosine dot-product lookups#195
madara88645 merged 2 commits intobolt-tfidf-sparse-dict-intersection-5659582442749844468from
codex/sub-pr-193

Codex AI commented Mar 18, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Codex AI commented Mar 18, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Codex AI commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Codex AI commented Mar 18, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Codex AI commented Mar 18, 2026 •

edited

Loading

vercel bot commented Mar 18, 2026 •

edited

Loading